Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

نویسندگان

چکیده

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown core with features of state and action. Despite much recent progress in analyzing algorithms the linear MDP setting, understanding more general models very restrictive. In this paper, we propose a provably efficient RL algorithm given multinomial logistic model. show that our proposed based on upper confidence bounds achieves O(d√(H^3 T)) regret bound where d dimension core, H horizon, T total number steps. To best knowledge, first function approximation provable guarantees. also comprehensively evaluate numerically it consistently outperforms existing methods, hence achieving both efficiency practical superior performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Residual Algorithms: Reinforcement Learning with Function Approximation

A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general function-approximation system, such as a sigmoidal multilayer perceptron, a radial-basisfunction system, a memory-based learning syst...

متن کامل

Reinforcement Learning and Function Approximation

Relational reinforcement learning combines traditional reinforcement learning with a strong emphasis on a relational (rather than attribute-value) representation. Earlier work used relational reinforcement learning on a learning version of the classic Blocks World planning problem (a version where the learner does not know what the result of taking an action will be). “Structural” learning resu...

متن کامل

Fuzzy Kanerva-based function approximation for reinforcement learning

Radial Basis Functions and Kanerva Coding can give poor performance when applied to large-scale multi-agent systems. In this paper, we attempt to solve a collection of predator-prey pursuit instances and argue that the poor performance is caused by frequent prototype collisions. We show that dynamic prototype allocation and adaptation can give better results by reducing these collisions. We the...

متن کامل

Function Approximation in Hierarchical Relational Reinforcement Learning

Recently there have been a number of dif ferent approaches developed for hierarchi cal reinforcement learning in propositional setting We propose a hierarchical version of relational reinforcement learning HRRL We describe a value function approximation method inspired by logic programming which is suitable for HRRL

متن کامل

Decision Tree Function Approximation in Reinforcement Learning

We present a decision tree based approach to function approximation in reinforcement learning. We compare our approach with table lookup and a neural network function approximator on three problems: the well known mountain car and pole balance problems as well as a simulated automobile race car. We find that the decision tree can provide better learning performance than the neural network funct...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i7.25964